Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
نویسندگان
چکیده
In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed arm, and decision maker chooses one arm to pull receives corresponding reward. particular, mean-variance as risk criterion, best is with largest We apply Thompson sampling algorithm disjoint model, provide comprehensive regret analysis variant of proposed algorithm. For T rounds, K actions, d-dimensional feature vectors, prove bound $$O\left({\left({1 + \rho {1 \over \rho}} \right)d\,\ln \,T\ln {K \delta}\sqrt {dK{T^{1 2}}\ln \delta}{1 \over}}} \right)$$ that holds probability 1 − δ criterion tolerance ρ, any $$0 < \in \frac{1}{2},0 \delta 1$$ . The empirical performance our algorithms demonstrated via portfolio selection problem.
منابع مشابه
The multi-armed bandit problem with covariates
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewa...
متن کاملMulti-armed bandit problem with precedence relations
Abstract: Consider a multi-phase project management problem where the decision maker needs to deal with two issues: (a) how to allocate resources to projects within each phase, and (b) when to enter the next phase, so that the total expected reward is as large as possible. We formulate the problem as a multi-armed bandit problem with precedence relations. In Chan, Fuh and Hu (2005), a class of ...
متن کاملMulti-armed bandit problem with known trend
We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different on-line problems like active learning, music and interface recommendation applications, where when an arm is sampled by the model the received re...
متن کاملCombinatorial Multi-Objective Multi-Armed Bandit Problem
In this paper, we introduce the COmbinatorial Multi-Objective Multi-Armed Bandit (COMOMAB) problem that captures the challenges of combinatorial and multi-objective online learning simultaneously. In this setting, the goal of the learner is to choose an action at each time, whose reward vector is a linear combination of the reward vectors of the arms in the action, to learn the set of super Par...
متن کاملMulti-objective Contextual Multi-armed Bandit Problem with a Dominant Objective
In this paper, we propose a new multi-objective contextual multi-armed bandit (MAB) problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective MAB problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Systems Science and Systems Engineering
سال: 2022
ISSN: ['1861-9576', '1004-3756']
DOI: https://doi.org/10.1007/s11518-022-5541-9